DFS-Perf: A Scalable and Unified Benchmarking Framework for Distributed File Systems
نویسندگان
چکیده
A distributed file system (DFS) is a key component of virtually any cluster computing system. The performance of such system depends heavily on the underlying DFS design and deployment. As a result, it is critical to characterize the performance and design trade-offs of DFSes with respect to cluster configurations and real-world workloads. To this end, we present DFS-Perf, a scalable, extensible, and low-overhead benchmarking framework to evaluate the properties and the performance of various DFS implementations. DFS-Perf uses a highly parallel architecture to cover a large variety of workloads at different scales, and provides an extensible interface to incorporate user-defined workloads and integrate with various DFSes. As a proof of concept, our current DFS-Perf implementation includes several built-in benchmarks and workloads, including machine learning and SQL applications. We present performance comparisons of four stateof-the-art DFS designs, namely Alluxio, CephFS, GlusterFS, and HDFS, on a cluster with 40 nodes (960 cores). We demonstrate that DFS-Perf can provide guidance on existing DFS designs and implementations, while adding 5.7% overhead.
منابع مشابه
Dynamic configuration and collaborative scheduling in supply chains based on scalable multi-agent architecture
Due to diversified and frequently changing demands from customers, technological advances and global competition, manufacturers rely on collaboration with their business partners to share costs, risks and expertise. How to take advantage of advancement of technologies to effectively support operations and create competitive advantage is critical for manufacturers to survive. To respond to these...
متن کاملGUPFS: The Global Unified Parallel File System Project at NERSC
The Global Unified Parallel File System (GUPFS) project is a five -year project to provide a scalable, high -performance, high -bandwidth, shared file system for the National Energy Research Scientific Computing Center (NERS C). This paper presents the GUPFS testbed configuration, our benchmarking methodology, and some preliminary results.
متن کاملP2P Network Trust Management Survey
Peer-to-peer applications (P2P) are no longer limited to home users, and start being accepted in academic and corporate environments. While file sharing and instant messaging applications are the most traditional examples, they are no longer the only ones benefiting from the potential advantages of P2P networks. For example, network file storage, data transmission, distributed computing, and co...
متن کاملA Survey: Load Balancing for Distributed File System
Distributed Systems are useful for computation and storage of large scale data at dispersed location. Distributed File System (DFS) is a subsystem of Distributed System. DFS is a means of sharing of storage space and data. Servers, Storage devices and Clients are on dispersed location in DFS. Fault tolerance and Scalability are two main features of distributed file system. Performance of DFS is...
متن کاملThe Edge Node File System: A Distributed File System for High Performance Computing
The concept of using Internet edge nodes for High Performance Computing (HPC) applications has gained acceptance in recent times. Many of these HPC applications also have large I/O requirements. Consequently, an edge node file system that efficiently manages the large number of files involved can assist in improving application performance significantly. In this paper, we discuss the design of ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016